Using phonetic patterns for detecting social cues in natural conversations
نویسندگان
چکیده
Laughter and fillers like “uhm” and “ah” are social cues expressed in human speech. Detection and interpretation of such non-linguistic events can reveal important information about the speakers’ intensions and emotional state. The INTERSPEECH 2013 Social Signals Sub-Challenge sets the task to localize and classify laughter and fillers in the “SSPNet Vocalization Corpus” (SVC) based on acoustics. In the paper at hand we investigate phonetic patterns extracted from raw speech transcriptions obtained with the CMU Sphinx toolkit for speech recognition. Even though Sphinx was used out of the box and no dedicated training on the target classes was applied, we were able to successfully predict laughter and filler frames in the development set with ∼ 87% accuracy (unweighted average Area Under the Curve (AUC)). By accumulating our features with a set of standard features provided by the challenge organizers results increased above 92%. When applying the combined set to the test corpus we achieved 87.7% as highest score, which is 4.4% above the challenge baseline.
منابع مشابه
Production of English Lexical Stress by Persian EFL Learners
This study examines the phonetic properties of lexical stress in English produced by Persian speakers learning English as a foreign language. The four most reliable phonetic correlates of English lexical stress, namely fundamental frequency, duration, intensity, and vowel quality were measured across Persian speakers’ production of the stressed and unstressed syllables of five English disyllabi...
متن کاملDetecting Egregious Conversations between Customers and Virtual Agents
Virtual agents are becoming a prominent channel of interaction in customer service. Not all customer interactions are smooth, however, and some can become almost comically bad. In such instances, a human agent might need to step in and salvage the conversation. Detecting bad conversations is important since disappointing customer service may threaten customer loyalty and impact revenue. In this...
متن کاملMandarin Conversation: Turn-taking Cues in Exchange Structure
This study addresses turn taking in everyday Mandarin conversation from a phonetic perspective, in particular, suprasegmental analysis. The acoustic data are based on 103 exchanges from two conversations – one structured and one free. The results show that turn final cues are mostly signaled via various falling intonation patterns in Mandarin conversation. Some anomalies exist and these are ana...
متن کاملDetecting Overlapping Communities in Social Networks using Deep Learning
In network analysis, a community is typically considered of as a group of nodes with a great density of edges among themselves and a low density of edges relative to other network parts. Detecting a community structure is important in any network analysis task, especially for revealing patterns between specified nodes. There is a variety of approaches presented in the literature for overlapping...
متن کاملA Phonetic Investigation of Turn-taking Cues at Multiple Unit-levels in Japanese Conversation
In this paper, we investigate acoustic, prosodic, and syntactic cues at multiple unit-levels for turntaking in Japanese conversation, proposing an incremental and hierarchical model of turnprojection, which is applicable to both overlapping and non-overlapping speech. Based on a quantitative analysis of Japanese three-party conversations, we identify several turn-taking cues that are located ea...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013